Published January 18, 2024
By Kimberly Mann Bruch and Aanika Tipirneni, SDSC Communications
Over 18 million people worldwide are annually diagnosed with cancer, with each case hiding many mutations in its genome. Understanding these mutations furthers cancer research, while also providing a deeper understanding to create possible cures, therapies and prevention strategies. The Triton Shared Computer Cluster (TSCC) at the San Diego Supercomputer Center (SDSC) at UC San Diego was recently used by a team of researchers to develop and test SigProfilerMatrixGenerator, a novel bioinformatics tool, that can classify and visualize large-scale mutational events – mutations that affect more than 50 DNA base pairs.
“Our new bioinformatics tool revolutionizes the way we visualize and explore structural variations (SV) and copy number variations (CNV) to decipher genomic anomalies in cancer development,” said Ludmil Alexandrov, an associate professor of bioengineering and cellular and molecular medicine at UC San Diego. “Without access to SDSC’s TSCC, we would not have been able to develop and test SigProfilerMatrixGenerator – we especially utilized the high-performance GPU and CPU computing on the cluster for our work.”
Written in Python while also including an R wrapper, the team’s tool accommodates two classification schemes for SVs and CNVs – allowing cancer specialists to analyze and visualize intricate mutational patterns across various cancer types.
SigProfilerMatrixGenerator is able to compound data from CV mutations into 48 different channels and SVs into 32 channels, to easily understand the pattern of genetic mutations and find their specific signatures. In terms of efficiency, SigProfilerMatrixGenerator has been developed to handle large datasets and can generate both a CNV and a SV count matrix for thousands of samples in a couple of seconds.
Details of the tool are available in a recently published article entitled Visualizing and Exploring Patterns of Large Mutational Events with SigProfilerMatrixGenerator in the BMC Genomics journal.
This work was supported by the U.S. National Institutes of Health (grant nos. R01ES030993-01A1, R01ES032547-01 and R01CA269919-01), the Cancer Research UK Grand Challenge (award no. C98/A24032) and a Packard Fellowship for Science and Engineering.
Information about SDSC’s TSCC can be found on the SDSC TSCC webpage.
Share